Goto

Collaborating Authors

 general knowledge question


ICL Optimized Fragility

arXiv.org Artificial Intelligence

ICL guides are known to improve task-specific performance, but their impact on cross-domain cognitive abilities remains unexplored. This study examines how ICL guides affect reasoning across different knowledge domains using six variants of the GPT-OSS:20b model: one baseline model and five ICL configurations (simple, chain-of-thought, random, appended text, and symbolic language). The models were subjected to 840 tests spanning general knowledge questions, logic riddles, and a mathematical olympiad problem. Statistical analysis (ANOVA) revealed significant behavioral modifications (p less than 0.001) across ICL variants, demonstrating a phenomenon termed "optimized fragility." ICL models achieved 91%-99% accuracy on general knowledge tasks while showing degraded performance on complex reasoning problems, with accuracy dropping to 10-43% on riddles compared to 43% for the baseline model. Notably, no significant differences emerged on the olympiad problem (p=0.2173), suggesting that complex mathematical reasoning remains unaffected by ICL optimization. These findings indicate that ICL guides create systematic trade-offs between efficiency and reasoning flexibility, with important implications for LLM deployment and AI safety.


Anki Vector robot review: A magnetic personality covers a lack of smarts

PCWorld

If I was reviewing Anki's Vector as just another smart speaker or Alexa-enabled device, it would be hard to recommend. It takes too long to answer, has a limited set of skills (and Skills), and often won't respond to its wake word until the second or third attempt. But the thing is, even with all its flaws, Vector is just so darn likable. When it's not responding to your queries, Vector sleeps (and snores). He reacts to loud noises.


Are YOU smart enough to pass this IQ test?

Daily Mail - Science & tech

This online IQ quiz promises to challenge even the brightest players with its mind-boggling brainteasers. Indeed the 10-question test is so tricky that creator Cody Cross claims only those who hold a PhD will be able to score full marks. The test, which was shared on Playbuzz, combines word puzzles with general knowledge questions, covering topics ranging from maths to art. Commenting on the quiz, Mr Cross said superior analytical and memory skills were needed to figure out the correct response. However he does offer players some help - providing multiple-choice answers beneath each question.